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1. A machine executable program for assisting a computing system to effectuate distributed 
voice query recognition comprising: 

a first audio signal receiving routine for receiving user speech utterance signals representing 
speech utterances to be recognized, said speech utterances including sentences comprised of 
one or more words; and 

a first signal processing routine adapted to generate representative speech data values from 
said speech utterance signals, said representative speech data values being characterized by a 
first data content that is substantially inadequate by itself for permitting recognition of words 
articulated in said speech utterance; and 

a formatting routine for rendering said representative speech data values into a format 
suitable for transmission over a communications channel to a second processing routine 
executing on a separate computing system; and 

wherein said first data content in said representative speech data values is used by said second 
processing routine to compute additional data content that when combined with said first data 
content is sufficient for completing recognition of words articulated in said speech utterance at 
said separate computing system. 

2. The program of claim 1, wherein said program works within a browser program executing 
on said computing system as part of a client-server based system. 

3. The program of claim 1, wherein said first signal processing routine is further adapted to 
handle a stream of continuous speech utterances, such that a plurality of representative 
speech data values are generated for each of said speech utterances in real-time. 

4. The program of claim 1, wherein said first data content is sufficiently small that said 
formatting routine can handle in real-time a continuous stream of representative speech 
data values that may be generated for a corresponding stream of speech utterances. 



5. The program of claim 1, wherein each of said representative speech data values 
corresponds to a separate cepstral coefficient value for a corresponding.frequency 
component of said user speech utterance signals, and said first data content corresponds to 
a set of said frequency components spanning an audible speech frequency range. 

6. The program of claim 5, wherein said additional data content corresponds to a set of delta 
and acceleration coefficients computed from a corresponding set of said ceptstral 
coefficient values. 

7. The program of claim 6, wherein said additional data content is generated by said second 
processing routine with less latency than would that resulting if said additional data 
content were generated by said first signal processing routine. 

8. The program of claim 1, wherein signal processing functions required to generate said first 
data content and said additional data content can be allocated between said first signal 
processing routine and second signal processing routine as needed based on computing 
resources available to said first and second signal processing routines respectively. 

9. The program of claim 1, wherein said first signal processing routine is a set of instructions 
executed by any one of a host microprocessor, an embedded processor, and/or a digital 
signal processor (DSP). 

10. The program of claim 1, wherein said first signal processing routine and said first audio 
signal are implemented as part of a desktop computing system, a portable computing 
system, a personal digital assistant (PDA), a cell-phone, and/or an electronic interactive 
toy. 
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11. A machine executable program for use in a voice query recognition system that is 

distributed across a client system and a separate server system, the program comprising: 
a first audio signal receiving routine for receiving user speech utterance signals representing 
speech utterances to be recognized during a sequence of speech utterance evaluation time 
5 frames, said speech utterances including sentences comprised of one or more words; and 

a first signal processing routine adapted to generate representative speech data values for each 
speech utterance evaluation time frame during which speech utterance signals are received; 

a formatting routine for rendering said representative speech data values into a format 
suitable for transmission from the client system over a communications channel to a second 
10 processing routine executing on the server computing system; and 

wherein said representative speech data values constitute a minimum amount of information 
that can be used by said second processing routine to complete accurate recognition of said 
one or more words and said sentences. 

15 12. The program of claim 11, wherein said program works within a browser program 

executing on said computing system as part of a client-server based system. 

13. The program of claim 11, wherein each of said representative speech data values 
corresponds to a separate cepstral coefficient value for a corresponding frequency 

20 component of said user speech utterance signals, and said first data content corresponds to 

a set of said frequency components spanning an audible speech frequency range. 

14. The program of claim 13, wherein said additional data content corresponds to a set of delta 
and acceleration coefficients computed from a corresponding set of said cepstral coefficient 

25 values. 

iy The program of claim 11, wherein said second processing routine performs accurate 
recognition of said one or more words with less latency than would that resulting if said 
one or more words were recognized by said first signal processing routine. 
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16. A distributed voice recognition system comprising: 

a sound processing circuit adapted to receive a speech utterance and to generate associated 
speech utterance signals therefrom; and 

a first signal processing circuit adapted to generate a first set of speech data values from said 
speech utterance signals, said first set of speech data values being insufficient by themselves for 
permitting recognition of words articulated in said speech utterance; and 

a transmission circuit for formatting and transmitting said first set of speech data values over 
a communications channel to a second signal processing circuit; and 

said second signal processing circuit being configured to generate a second set of speech data 
values based on said speech data values, such that second set of speech data values contain 
sufficient information to be usable by a word recognition engine for recognizing words in said 
speech utterance. 

17. The system of claim 16, wherein said second set of speech data values include said first set 
of speech data values and a derived set of speech data values, which derived set of speech 
data values are computed based on said first speech data values. 

18. The system of claim 17, wherein said first set of data values are MFCC vector coefficients, 
and said derived set of speech data values are MFCC delta coefficients and a MFCC 
acceleration coefficients derived from said MFCC vector coefficients. 

19. The system of claim 16, wherein said second set of speech data values can be generated by 
said second signal processing circuit in a time that is less than the combination of a first 
time which would be required by said first signal processing circuit to generate said second 
set of speech data values from said first set of speech data values combined with a second 
time which would be required by said transmission circuit to format and transmit said 
second set of speech data values. 
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20. The system of claim 16, wherein said second set of speech data values can be generated by 
said second signal processing circuit in less time than that which would be required by said 
first signal processing circuit to generate said second set of speech data values from said 
first set of speech data values. 

21. The system of claim 16, wherein signal processing responsibilities of said first and second 
signal processing circuits are allocated such that said first signal processing circuit performs 
less than approximately M the required signal processing operations needed to convert said 
speech utterance signals into a form usable by a word recognition engine. 

22. The system of claim 16, wherein signal processing functions required to generate said first 
and second set of speech data values can be allocated between said first signal processing 
circuit and second signal processing circuit as needed based on computing resources 
available to said first and second signal processing circuits respectively. 

23. The system of claim 16, wherein signal processing functions required to generate said first 
and second set of speech data values can be allocated between said first signal processing 
circuit and second signal processing circuit as needed based on computing resources 
available to said first and second signal processing circuits respectively. 

24. The system of claim 16, wherein signal processing functions performed by said first signal 
processing circuit and second signal processing circuit are configured based on: (i) 
computing resources available to said first and second signal processing circuits; (ii) 

20 performance characteristics of said transmission circuit; and (iii) transmission latencies of 

said communications channel. 

25. The system of claim 16, wherein said first signal processing circuit is also configured to 
assist said second signal processing circuit with signal processing computations required to 
generate said second set of speech data values. 

25 26. The system of claim 16, wherein said first set of speech data values represent the least 

amount of data that can used by said second signal processing circuit to generate said 
second set of data values usable for a word recognition process. 
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27. A system for assisting a client computing device to perform speech recognition in 
cooperation with a server computing device, the system comprising: 

a speech utterance capture circuit for receiving a speech utterance and generating associated 
speech utterance signals, where said speech utterance can include an articulated sentence of one 
5 or more articulated words; and 

a speech utterance signal processing circuit, said signal processing being configurable to 
perform data extracting operations on said speech utterance signals to generate a set of 
frequency related speech utterance signals for said articulated sentence; and 

a transmission circuit for coding said set of frequency related speech utterance signals into a 
10 format suitable for transmission over a communications channel to the server; 

a receiving circuit for receiving a response to said articulated sentence through said 
communications channel from the server, said response being generated by said server using 
said set of frequency related speech utterance signals to perform a word recognition operation 
on said one or more articulated words and a sentence recognition operation on said articulated 
15 sentence; and 

wherein a latency associated with performing said speech recognition is minimized by 
optimizing an allocation of signal processing responsibilities for said speech utterance signals 
between the client computing device and the server computing device. 
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28. A system for assisting a client computing device to perform real-time speech recognition in 
cooperation with a server computing device, the system comprising: 

a sound processing circuit integrated within the client computing device, said sound 
processing circuit being adapted to receive a continuous speech utterance and to generate 
associated speech utterance signals therefrom, wherein said speech utterance can include an 
articulated sentence of one or more articulated words; and 

a first signal processing routine adapted to be executed by the client computing device, and 
which first signal processing routine is further adapted to continuously generate a set of speech 
based vector coefficients as needed from said speech utterance signals; and 

a transmission circuit coupled to the client computing device for coding said set of speech 
based vector coefficients into a format suitable for transmission over a communications 
channel to the server, said set of speech-based vector coefficients being transmitted in real-time 
as said speech utterances occur; 
a receiving circuit coupled to the client computing device for receiving a real-time response 
15 to said articulated sentence through said communications channel from the server; 

wherein said response is generated by said server substantially on a real-time basis using said 
set of speech based vector coefficients to perform a second signal processing routine which 
completes a word recognition operation on said one or more articulated words, as well as a 
sentence recognition operation on said articulated sentence. 
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29. A distributed speech recognition system for processing a speech utterance comprising: 
a first signal processing circuit associated with a client computing system, said first signal 

processing circuit being adapted to generate a first set of speech data values from speech 
utterance signals, wherein said first set of speech data values have a limited data content to 
5 reduce processing and transmission latencies in the distributed speech recognition system; and 

a second signal processing circuit associated with a separate server computing system, said 
second signal processing circuit being configured to generate a second set of speech data values 
derived from said first set of speech data values, and being further configured to generate a 
combined speech data value set consisting of said second set of speech data values and said first 
10 set of data values; 

a word recognition circuit adapted to use said combined speech data value set and for 
generating recognizing words in the speech utterance. 

30. The system of claim 29, further including a sentence recognition circuit which recognizes 
15 an articulated sentence containing said recognized words. 

31. The system of claim 30, wherein said articulated sentence can include one of a number of 
predefined sentences recognizable by said system, and said articulated sentence is 
recognized by identifying a candidate set of potential sentences from said number of 
predefined sentences corresponding to said articulated sentence, and then comparing each 

20 entl 7 in tne candidate set of potential sentences to said articulate sentence to determine a 

matching recognized sentence. 

32. The system of claim 31, wherein said articulated sentence is processed by a natural 
language engine operating on said recognized words. 

33. The system of claim 32, wherein said articulated sentence is compared against said 
25 candidate set of potential sentences by examining noun phrases. 

34. The system of claim 31, wherein said candidate set of potential sentences are determined in 
part by a context dictionary loaded by said sentence recognition circuit in response to an 
operating environment presented by said system to a user. 



35. A system for recognizing speech information, comprising: 

storage means for storing one or more words to be recognized by the system based on the 
speech information; and 

means for capturing speech signals corresponding to the speech information; and 

first processing means for generating partially recognized speech data from said speech 
signals, said first processing means performing a first signal processing operation on said 
speech signals, said first signal processing operation being insufficient to permit said partially 
recognized speech data to be correlated with said one or more words; and ' 

second processing means for generating recognizable speech data from said partially 
recognized speech data, using a second signal processing operation, such that said recognizable 
speech data can be correlated with said one or more words, said second processing means 
being distinct and physically separated from said first processing means; and 

a non-permanent data transmission connection coupling said first and second processing 
means; 

transmitting means for transmitting said partially processed speech data signals from said 
first processing means through said non-permanent data transmission connection to said 
second processing means; 

wherein said speech information is correlated to said one or more words based on said first 
and second signal processing operations. 

36. The system of claim 35, wherein said first processing means is located at a client site, and 
said second processing means is located at a remote server site. 

37. The system of claim 36, wherein said storage means is also located at said server site. 

38. The system of claim 35, wherein said non-permanent connection is either a circuit 
switched or packet switched connection. 

39. The system of claim 35, wherein said non-permanent connection includes an intranet 
network linking said first and second signal processing means. 



40. The system of claim 35, wherein said non-permanent connection is a wireless 
communications channel. 



41. The system of claim 35, wherein said first signal processing operation includes an 
operation for extracting spectral parameter vectors from said speech signals. 

42. The system of claim 41, wherein said first signal processing operation further includes an 
operation for decomposing the spectral parameter vectors with a Mel-frequency transfer 
process to obtain MFCC coefficients for said spectral parameter vectors. 

43. The system of claim 41, wherein said partially recognized speech data comprises an 
observation vector O t which includes said MFCC coefficients and delta and acceleration 
coefficients obtained from said spectral parameter vectors. 

44. The system of claim 1, wherein said partially recognized speech data includes an 
observation vector O t , and second signal processing operation includes a Viterbi decoding 
operation for mapping a sequence of said observation vectors O t with a speech data 
symbol. 

45. The system of claim 10, wherein said second signal processing operation further includes a 
text conversion operation for converting said speech data symbol into one or more text 
words. 

46. The system of claim 11, further wherein said second processing means performs a query 
for determining which of said one or more words correspond to said one or more text 
words. 

47. The system of claim 1, wherein said second processing means further utilizes environment 
variables for determining which of said one or more text words correspond to such speech 
information. 



48. A method of performing voice recognition comprising the steps of: 

(a) receiving user speech utterance signals representing speech utterances to be recognized, 
said speech utterances including sentences comprised of one or more words; and 

(b) performing a partial recognition of said one or more words contained in said speech 
utterance signals with a first computing device to generate representative speech data 
values; and 

(c) formatting said representative speech data values into a format suitable for transmission 
over a communications channel from said first computing device to a second computing 
device; and 

wherein said representative speech data values contain sufficient data content such that a 
complete recognition of said one or more words can be completed by said second computing 
device. 

49. The method of claim 48, wherein said content is sufficiently small that said partial 
recognition and formatting can be performed in real-time so as to generate a continuous 
stream of representative speech data values. 

50. The method of claim 48, wherein complete recognition of said one or more words is 
achieved with less latency than that resulting if said complete recognition were performed 
by said first computing device. 

51. The method of claim 48, wherein signal processing functions required to perform said 
partial recognition can be allocated between said first computing device and said second 
computing device as needed based on computing resources available to said first and 
second computing devices respectively. 

52. The method of claim 48, wherein said first computing device is part of a client computing 
system, and said second computing device is part of a server computing system, and said 
communications channel is a network. 
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53. A method of performing distributed voice recognition comprising the steps of: 

(a) receiving user speech utterance signals representing speech utterances to be recognized 
during a sequence of speech utterance evaluation time frames, said speech utterances 
including sentences comprised of one or more words; and 

(b) generating representative speech data values with a first processing circuit for each speech 
utterance evaluation time frame during which speech utterance signals are received; 

(c) encoding said representative speech data values into a format suitable for transmission over 
a communications channel to a second processing circuit; and 

wherein said representative speech data values constitute a minimum amount of information 
that can be used by said second processing circuit to complete accurate recognition of said one 
or more words and said sentences. 

54. The method of claim 53, wherein said recognition of said one or more words occurs in 
real-time. 

The method of claim 53, wherein each of said representative speech data values 
corresponds to a separate cepstral coefficient value for a corresponding frequency 
component of said user speech utterance signals, and said minimum amount of 
information corresponds to a set of cepstral coefficients for frequency components 
spanning an audible speech frequency range. 

56. The method of claim 55, wherein a set of delta and acceleration coefficients are computed 
from said cepstral coefficient values to complete recognition of said one or more words 
and said sentences. 

25 

57. The method of claim 53, wherein said second processing circuit performs accurate 
recognition of said one or more words with less latency than would that resulting if said 
one or more words were recognized by said first processing circuit. 
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58. A method of performing distributed speech recognition using a first computing device and 
a second computing device, the method comprising the steps of: 

(a) receiving a speech utterance at the first computing device; and 

(b) generating associated speech utterance signals from said speech utterance with the 
5 first computing device; and 

(c) generate a first set of speech data values from said speech utterance signals at the 
first computing device, said first set of speech data values being insufficient by 
themselves for permitting recognition of words articulated in said speech utterance; 
and 

10 (d) formatting said first set of speech data values at the first computing device to be 

compatible with a communications protocol used by a communications channel 
coupled to the first computing device; 
(e) transmitting said first set of speech data values through said channel to the second 
computing device; and 

15 (f) generating a second set of speech data values based on said speech data values, such 

that second set of speech data values contain sufficient information to be usable by 
a word recognition engine for recognizing words in said speech utterance. 

59. The method of claim 58, wherein said second set of speech data values include said first set 
20 °f speech data values and a derived set of speech data values, which derived set of speech 

data values are computed based on said first speech data values. 

60. The method of claim 58, wherein said second setiof speech data values can be generated by 
said second computing device in a time that is less than the combination of a first time 

25 which would be required by said first computing device to generate said second set of 

speech data values from said first set of speech data values combined with a second time 
which would be required to format and transmit said second set of speech data values. 
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61. The method of claim 58, wherein signal processing responsibilities of said first and second 
computing devices are allocated such that said first computing device performs less than 
approximately V4 the required signal processing operations needed to convert said speech 
utterance signals into a form usable by a word recognition engine. 

62. The method of claim 58, wherein signal processing functions performed by said first and 
second computing devices are configured based on: (i) computing resources available to 
said first and second computing devices; and (ii) transmission latencies of said 
communications channel. 

63. The method of claim 58, wherein said first processing device is also configured to assist 
said second processing device with signal processing computations required to generate 
said second set of speech data values. 

64. The method of claim 58, wherein said first set of speech data values represent the least 
amount of data that can used by said second processing device to generate said second set 
of data values usable for a word recognition process. 



65. A method of performing distributed recognition of a speech utterance comprising the steps 
of: 

(a) generating a first set of speech data values from speech utterance signals at a first 
computing system, wherein said first set of speech data values have a limited data 
content to reduce processing and transmission latencies; and 

(b) generating a second set of speech data values derived from said first set of speech data 
values at a second computing system, said second computing system being 
independently operable from said first computing system; and 

(c) generating a combined speech data value set at said second computing system 
consisting of said second set of speech data values and said first set of data values; 

(d) generating a list of recognized words in said speech utterance. 

66. The method of claim 65, further including a step: (e) recognizing an articulated sentence 
containing said list of recognized words. 

67. The method of claim 66, wherein said articulated sentence can include one of a number of 
predefined recognizable sentences, and said articulated sentence is recognized by 
identifying a candidate set of potential sentences from said number of predefined sentences 
corresponding to said articulated sentence, and then comparing each entry in the candidate 
set of potential sentences to said articulate sentence to determine a matching recognized 
sentence. 

68. The method of claim 67, wherein said articulated sentence is processed by a natural 
language engine operating on said recognized words. 

69. The method of claim 68, wherein said articulated sentence is compared against said 
candidate set of potential sentences by examining noun phrases. 

70. The method of claim 67, wherein said candidate set of potential sentences are determined 
in part by a context dictionary loaded in response to an operating environment presented 
to a user articulating said sentence. 



